-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganize be v1.0 #54
base: 34-improvement-backend-improvements-for-typing-tests-modularity-and-pathlib
Are you sure you want to change the base?
Conversation
*Optimization add type hints to all the functions *Optimization replace OS path function with pathlib functions <Signed off by Prakash([email protected])>
*Optimization: Create new folder and create app.py, mediahandler.py response_handler.py and utils.py *Modify: Modify main.cpp to take the working directory into consideration, and avoid missing config.json *Modify: modify config.json to point towards the correct directory <Signed off by Praksh([email protected])>
*New : Create different function in response_handler to ensure consistent JSON communicatino *Change: In C++ use the std:out to return JSON object to backend *Change: Move all the logs to the std:err, to ensure std:out is only used for communication *Future: Keep the original app.py redundantly for future reference *Modify: Modify the frontend to be ensure compatibility with the New JSON communication scheme
*Add: Add unit tests for all static functions in utils.py and media_handler.py *improve: Improve the logs, remove irrelevant logs *refactor: remove unused import statements, change structure <Signed off by Prakash([email protected])>
*Task: Merge all the changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @prakash-2002-ramanathan, thanks for this PR!
I skimmed through the changes, and I'd like to touch on a few points:
- CLI nature of core: For the moment, we need to keep this capability. There is an open issue assigned to build a proper CLI for the core, but until that’s addressed, it needs to remain independently usable.
- Keeping
main
clean:main
should serve strictly as a simple entry point to the application, without containing any processing logic. Please encapsulate these into a class likeCommunicationHandler
and simply use its methods on main. Here's an example of what I had on my mind:
This:
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " <video_file_path>" << std::endl;
return 1;
}
would become:
if (argc != 2) {
CommunicationHandler::handleError("Usage: ...");
}
and then within handleError()
you'd call a member function to generate a json response and exit with a sensible code. This would allow us to abstract communication handling and improve over time while keeping the public API stable.
I understand this is not true IPC, but as I explained in #34, the immediate goal is to eliminate the existing error-prone implementation. Improving how we handle responses within core will already be a big step in the right direction.
- Config file path changes: I didn't quite understand why adding the
../
before MediaProcessor paths were needed there. Could you clarify why this was necessary?
@omeryusufyagci, The file path in python is relative to the python file. But since the MediaHandler is outside the backend directory. We are going to the previous directory where the Mediahandler is present. This seemed like a simple fix |
Hi @prakash-2002-ramanathan, I wanted to check with you how you're progressing. Although I appreciate the effort directly on the core, #34 mainly addresses the backend, where additions like the response handler will be ready in place, pending an update of the core for full integration. You can still tackle the update of the core, but I'd propose to do it on a separate PR, and focus here fully on the backend improvements. The backend should function the same as it does right now, while the necessary handlers implemented and ready for the integration. Tests are a big part of this PR, and would be quite impactful as we get ready to make larger changes. Please let me know if this sounds good to you. Thanks! |
Hi @omeryusufyagci |
Hi @prakash-2002-ramanathan, that's great to hear! I would still like you to submit your work on the core as a separate PR though. Let's evaluate the work you've done on the backend, finalize, merge and then take a look at your work on the core separately. This is also to expedite the addition of backend tests in preparation of our first release. Finally, could you please target this PR to the dedicated branch opened under #34? Thanks! Looking forward to see the changes! |
*(deletion) Original app.py *(Updation) requirements *(modification) make changes to preserve the CLI nature and only make the backend changes
*(Fix) Add init files in python directory, so that the directories are considered as modules *(Chek) Make sure that the backend is functioning just like how it used to
Hi @omeryusufyagci , have stashed all the changes made in the core, only the changes in backend have been pushed. You can run the new app by going to the backend directory and then running app.py. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @prakash-2002-ramanathan, thanks for the changes!
We're on the right track, but there are still a few things to address. Also your changes on the core are still visible on the PR.
I only reviewed the backend changes, and will review the tests once the backend part is finished.
In addition to the review comments, I'd also like you to keep the app.py
at project root, but this will be a minimal entry point for the backend. So 2-3 lines just to instantiate thje backend service and run it. This is to ensure we keep our current ease of use. Please feel free to make the necessary changes to faciliatate this as well.
Many thanks!
backend/media_handler.py
Outdated
# Parse the output to get the processed video path (TODO: encapsulate) | ||
for line in result.stdout.splitlines(): | ||
if "Video processed successfully" in line: | ||
processed_video_path = line.split(": ", 1)[1].strip() | ||
|
||
# Remove any surrounding quotes (TODO: encapsulate) | ||
if processed_video_path.startswith('"') and processed_video_path.endswith('"'): | ||
processed_video_path = processed_video_path[1:-1] | ||
processed_video_path = str(Path(processed_video_path).resolve()) | ||
logging.info(f"Processed video path returned: {processed_video_path}") | ||
return processed_video_path | ||
|
||
return None | ||
except Exception as e: | ||
logging.error(f"Error running C++ binary: {e}") | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of the backend refactor, please address the 2 todo messages in the code. Mainly we want to organize common utils under a utils class/file.
backend/response_handler.py
Outdated
return jsonify(response), 200 | ||
|
||
@staticmethod | ||
def error(message: str, status_code: int = 400) ->tuple[Response, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe here the return type should be simply a Response type object. Flask understands it carries an error code, but it's not necessary to hint int
on the return type.
backend/response_handler.py
Outdated
def success(message: str, data: dict = None) ->Response: | ||
"""Return a consistent JSON response for success.""" | ||
response = { | ||
"status": "success", | ||
"message": message, | ||
"data": data or {} | ||
} | ||
return jsonify(response), 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
method names should be descriptive. Please use generate_success_response()
, and similar for others. Thinking from the users perspective can help; ResponseHandler.succes()
vs ResponseHandler.generate_success_response()
backend/response_handler.py
Outdated
@staticmethod | ||
def core_data_passer(message: str, data: dict = None) -> str: | ||
"""Return a JSON-formatted response for core components with provided data.""" | ||
core_response = { | ||
"status": "success", | ||
"message": message, | ||
"data": data or {} | ||
} | ||
return dumps(core_response) | ||
#You can only use jasonify for http communication to send the data as a string use json.dumps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this a bit more clearer? Even for me it's abit difficult to understand what our intention is with this method. Is this a bidirectional serializer, or?.. Both in the method name and the docstring we should be able to immediately understand the purpose of this method.
config.json
Outdated
"deep_filter_path": "MediaProcessor/res/deep-filter-0.5.6-x86_64-unknown-linux-musl", | ||
"deep_filter_tarball_path": "MediaProcessor/res/DeepFilterNet3_ll_onnx.tar.gz", | ||
"deep_filter_encoder_path": "MediaProcessor/res/DeepFilterNet3_ll_onnx/tmp/export/enc.onnx", | ||
"deep_filter_decoder_path": "MediaProcessor/res/DeepFilterNet3_ll_onnx/tmp/export/df_dec.onnx", | ||
"deep_filter_path": "../MediaProcessor/res/deep-filter-0.5.6-x86_64-unknown-linux-musl", | ||
"deep_filter_tarball_path": "../MediaProcessor/res/DeepFilterNet3_ll_onnx.tar.gz", | ||
"deep_filter_enoder_path": "../MediaProcessor/res/DeepFilterNet3_ll_onnx/tmp/export/enc.onnx", | ||
"deep_filter_decoder_path": "../MediaProcessor/res/DeepFilterNet3_ll_onnx/tmp/export/df_dec.onnx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be handled on the backend side. The config represents the path relation of these files wrt the project root.
Hi @omeryusufyagci , fixed all your requested changes. Please have a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @prakash-2002-ramanathan, thanks for the changes. I took a look and there are a few things I'd like you to address. Could you please take a look? Thanks a lot!
backend/response_handler.py
Outdated
@staticmethod | ||
def generate_success_response(message: str, data: dict = None) ->Response: | ||
"""Return a consistent JSON response for success.""" | ||
response = { | ||
"status": "success", | ||
"message": message, | ||
"data": data or {} | ||
} | ||
return jsonify(response), 200 | ||
|
||
@staticmethod | ||
def generate_error_response(message: str, status_code: int = 400) ->Response: | ||
"""Return a consistent JSON response for errors.""" | ||
response = { | ||
"status": "error", | ||
"message": message, | ||
"data": {} | ||
} | ||
return jsonify(response), status_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here please add a generate_response()
function to hold shared behavior, that is, generating a response.
That new method should take in any data available with the status code, etc. So you would call the generate_response()
from generate_success_response()
, etc.
backend/tests/README.md
Outdated
`cd /home/prakash/fast-music-remover` | ||
|
||
`PYTHONPATH=$(pwd) python -m unittest discover -s backend/tests -p 'test_*.py` | ||
|
||
Run the above two commands to test the code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use pytest here, also as a side note, usually you would do:
cd /home/`whoami`/...
which would work for anyone using a Unix-based OS.
Let's go with pytest regardless
templates/index.html
Outdated
if (data.status === "completed") { | ||
document.getElementById("videoPlayer").src = data.video_url; | ||
if (data.status === "success") { | ||
document.getElementById("videoPlayer").src = data.data.video_url; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data.data typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No @omeryusufyagci , in the response object that is being returned, the video_url is wrapped inside another data object. Do you have any suggestions on how to improve this ?
I have also asked questions on your other comments. Please answer them too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also because data is used both in the js and the python side. You could rename the data field of the json object to payload
which is quite std to use.
config.json
Outdated
@@ -1,7 +1,7 @@ | |||
{ | |||
"deep_filter_path": "MediaProcessor/res/deep-filter-0.5.6-x86_64-unknown-linux-musl", | |||
"deep_filter_tarball_path": "MediaProcessor/res/DeepFilterNet3_ll_onnx.tar.gz", | |||
"deep_filter_encoder_path": "MediaProcessor/res/DeepFilterNet3_ll_onnx/tmp/export/enc.onnx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
@@ -58,4 +58,4 @@ int main(int argc, char* argv[]) { | |||
|
|||
std::cout << "Video processed successfully: " << processedMediaPath << std::endl; | |||
return 0; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please run the clang (with the file available on project root, i.e. .clang-format) for MediaProcessor/src or setup the pre-commit hooks which would do this for you
backend/tests/test_media_handler.py
Outdated
@patch("backend.media_handler.yt_dlp.YoutubeDL") | ||
def test_download_media(self, mock_yt_dlp): | ||
# Mock YoutubeDL instance | ||
mock_ydl_instance = MagicMock() | ||
|
||
# Mock extract_info for download=False and download=True | ||
mock_ydl_instance.extract_info.side_effect = [ | ||
{"title": "Test Video", "ext": "mp4"}, # For download=False | ||
{"ext": "mp4"} # For download=True | ||
] | ||
|
||
# Set the return value when instantiating YoutubeDL | ||
mock_yt_dl_context_manager = MagicMock() | ||
mock_yt_dl_context_manager.__enter__.return_value = mock_ydl_instance | ||
mock_yt_dlp.return_value = mock_yt_dl_context_manager | ||
|
||
# Test download_media method | ||
result = MediaHandler.download_media(self.video_url, self.base_directory) | ||
expected_file = Path(self.base_directory) / "Test_Video.mp4" | ||
self.assertEqual(result, str(expected_file.resolve())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes due to rate limiting yt-dlp takes minutes to finish. Please add a timeout, e.g. 10 seconds, here for getting the info which should be rather fast irrespective of media size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey do you want to add the time out on the main code too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes would be good
backend/tests/test_media_handler.py
Outdated
@patch("backend.media_handler.subprocess.run") | ||
def test_process_with_media_processor_success(self, mock_subprocess_run): | ||
# Mock subprocess.run to simulate a successful response | ||
mock_subprocess_run.return_value = MagicMock( | ||
returncode=0, | ||
stdout="Video processed successfully: /path/to/processed_video.mp4", | ||
stderr="" | ||
|
||
) | ||
# Test process_with_media_processor | ||
result = MediaHandler.process_with_media_processor( | ||
self.video_path, self.base_directory, self.config_path | ||
) | ||
|
||
self.assertEqual(result, "/path/to/processed_video.mp4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's happening here with the assertion on l.55?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see here for an example on how a similar test was made for the core. You could reuse the same utility, just need to convert it to python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand, would appreciate if you could elaborate. In the files that you have attached, you did not mock anything, you created a processed file and compared byte by byte is that what you expect ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, /path/to/...
has to be a real path first. Regarding what we did in the core: we have a known truth, i.e. a processed video that we manually verified by watching it as a reference of what proper filtering is doing to a known source. We then do a byte by byte comparison with that known truth to see if functionality is in tact.
Byte-by-byte is definitely not the best way to do this, but it was a fast solution that actually works with a good tolerance. To be improved, but better than nothing
backend/tests/test_media_handler.py
Outdated
def setUp(self): | ||
"""Set up base directory and mock paths for testing.""" | ||
self.base_directory = tempfile.mkdtemp() | ||
self.video_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't place data inside of the code. That later would require code updates for simple test data changes. These should be placed in appropriate test data files. You can see what has been done for the MediaProcessor for examples.
backend/tests/test_media_handler.py
Outdated
self.config_path = "/path/to/config.json" | ||
self.video_path = "/path/to/video.mp4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what these mean?
Hey @omeryusufyagci , could you explain how do you want this |
Hey @prakash-2002-ramanathan, you could try to test via the extracted metadata instead of the video itself to test if yt-dlp works fine. If we can get the metadata, we should likely be OK. You could, however, mock it as well. Byte-by-byte is not the best, I know... I'll open an issue about that. Thanks! |
Hello @omeryusufyagci, I have addressed all your requested changes. Please have a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @prakash-2002-ramanathan, thank you for the updates.
I've also updated branch 34-...
to reduce the amount of clutter as it was getting hard to review.
I noticed the changes for core are still present in this PR. Could you please rebase your commit history to remove them from here? We should focus on those on a separate PR after this is finished.
Regarding the updates, in addition to the comments I left, here are some more global remarks:
- Please try to ensure the test names reflect exactly what we're testing for and what do we expect in return.
- Try to utilize more the features of the testing framework such as fixtures, and parameters. This is to make it easier to extend these tests later.
I appreciate your effort on this and looking forward to merge it soon! In the meantime, have you noticed we have a discord now? If you have any questions, or just want to say hello, pass by!
from typing import Optional | ||
|
||
|
||
class Utils: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all member functions are static, there doesn't need to be a class. Please revise it as a module and update the implementations accordingly
DOWNLOAD_URL = "https://www.youtube.com/watch?v=TK4N5W22Gts" | ||
DEEP_FILTER_PATH = "MediaProcessor/res/deep-filter-0.5.6-x86_64-unknown-linux-musl" | ||
TEST_VIDEO_PATH = "Media/test.webm" | ||
PROCESSED_VIDEO_PATH = "Media/test_processed.mp4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move these to the top of the file
@pytest.fixture | ||
def test_directory(): | ||
"""Fixture to set up and tear down a temporary directory.""" | ||
test_dir = tempfile.mkdtemp() | ||
yield test_dir | ||
for root, dirs, files in os.walk(test_dir, topdown=False): | ||
for name in files: | ||
os.remove(os.path.join(root, name)) | ||
for name in dirs: | ||
os.rmdir(os.path.join(root, name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please strictly use pathlib.Path instead of os everywhere for uniformity.
Also let's try to get rid of those loops. shutil.rmtree()
should work
def test_sanitize_filename(): | ||
# Test filename sanitization | ||
assert Utils.sanitize_filename("test@file!.mp4") == "test_file_.mp4" | ||
assert Utils.sanitize_filename("valid_file-name.mp4") == "valid_file-name.mp4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For here and possibly other relevant tests, please use @pytest.mark.parametrize()
to define the test cases which would allow using a single assertion per test function. In general, we should try to avoid having multiple assertions per test.
def test_get_processed_video_path(): | ||
# Test with a valid string that includes the correct success message | ||
valid_processed_string = Utils.get_processed_video_path( | ||
["Video processed successfully: valid_file-name"] | ||
) | ||
assert valid_processed_string == "valid_file-name" | ||
|
||
# Test with an invalid string that does not contain the success message | ||
invalid_processed_string = Utils.get_processed_video_path( | ||
["Video processed successful"] | ||
) | ||
assert invalid_processed_string is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is an important test, could you please add better coverage against edge cases. Empty list, strangely formatted messages, etc.
from concurrent.futures import ThreadPoolExecutor, TimeoutError as FuturesTimeoutError | ||
from pathlib import Path | ||
from backend.media_handler import MediaHandler | ||
import utils as utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intentional?
BASE_DIR = Path(__file__).resolve().parents[2] | ||
TEST_DIR = Path(__file__).parent.resolve() | ||
DEEPFILTERNET_PATH = str((BASE_DIR / utils.DEEP_FILTER_PATH).resolve()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to have a pytest.fixture
to handle these.
@prakash-2002-ramanathan please note the update on the branch you're targeting that was required to add urgent functionalities. Please ensure the added functionality is kept in this PR. Thank you! |
Hello @omeryusufyagci
This PR addresses #34.
Please see if the current work is satisfactory
Possible issues:
config.json
config_file_path
in MediaHandlermain
to pass theconfig.json
file directory.I have not removed the original app.py. Not updated the README. Have not updated the requiremets.pip, some minor work is pending.